34 research outputs found
Technologies respectueuses de la vie privée pour le covoiturage
L'émergence des téléphones mobiles et objets connectés a profondément changé notre vie quotidienne. Ces dispositifs, grâce à la multitude de capteurs qu'ils embarquent, permettent l'accès à un large spectre de services. En particulier, les capteurs de position ont contribué au développement des services de localisation tels que la navigation, le covoiturage, le suivi de la congestion en temps réel...
En dépit du confort offert par ces services, la collecte et le traitement des données de localisation portent de sérieuses atteintes à la vie privée des utilisateurs. En effet, ces données peuvent renseigner les fournisseurs de services sur les points d'intérêt (domicile, lieu de travail, orientation sexuelle), les habitudes ainsi que le réseau social des utilisateurs. D'une façon générale, la protection de la vie privée des utilisateurs peut être assurée par des dispositions légales ou techniques. Même si les mesures d'ordre légal peuvent dissuader les fournisseurs de services et les individus malveillants à enfreindre le droit à la vie privée des utilisateurs, les effets de telles mesures ne sont observables que lorsque l'infraction est déjà commise et détectée. En revanche, l'utilisation des technologies renforçant la protection de la vie privée (PET) dès la phase de conception des systèmes permet de réduire le taux de réussite des attaques contre la vie privée des utilisateurs.
L'objectif principal de cette thèse est de montrer la viabilité de l'utilisation des PET comme moyens de protection des données de localisation dans les services de covoiturage. Ce type de service de localisation, en aidant les conducteurs à partager les sièges vides dans les véhicules, contribue à réduire les problèmes de congestion, d'émissions et de dépendance aux combustibles fossiles. Dans cette thèse, nous étudions les problèmes de synchronisation d'itinéraires et d'appariement relatifs au covoiturage avec une prise en compte explicite des contraintes de protection des données de localisation (origine, destination).
Les solutions proposées dans cette thèse combinent des algorithmes de calcul d'itinéraires multimodaux avec plusieurs techniques de protection de la vie privée telles que le chiffrement homomorphe, l'intersection sécurisée d'ensembles, le secret partagé, la comparaison sécurisée d'entier. Elles garantissent des propriétés de protection de vie privée comprenant l'anonymat, la non-chainabilité et la minimisation des données. De plus, elles sont comparées à des solutions classiques, ne protégeant pas la vie privée. Nos expérimentations indiquent que les contraintes de protection des données privées peuvent être prise en compte dans les services de covoiturage sans dégrader leurs performances.The emergence of mobile phones and connected objects has profoundly changed our daily lives. These devices, thanks to the multitude of sensors they embark, allow access to a broad spectrum of services. In particular, position sensors have contributed to the development of location-based services such as navigation, ridesharing, real-time congestion tracking...
Despite the comfort offered by these services, the collection and processing of location data seriously infringe the privacy of users. In fact, these data can inform service providers about points of interests (home, workplace, sexual orientation), habits and social network of the users. In general, the protection of users' privacy can be ensured by legal or technical provisions. While legal measures may discourage service providers and malicious individuals from infringing users' privacy rights, the effects of such measures are only observable when the offense is already committed and detected. On the other hand, the use of privacy-enhancing technologies (PET) from the design phase of systems can reduce the success rate of attacks on the privacy of users.
The main objective of this thesis is to demonstrate the viability of the usage of PET as a means of location data protection in ridesharing services. This type of location-based service, by allowing drivers to share empty seats in vehicles, helps in reducing congestion, CO2 emissions and dependence on fossil fuels. In this thesis, we study the problems of synchronization of itineraries and matching in the ridesharing context, with an explicit consideration of location data (origin, destination) protection constraints.
The solutions proposed in this thesis combine multimodal routing algorithms with several privacy-enhancing technologies such as homomorphic encryption, private set intersection, secret sharing, secure comparison of integers. They guarantee privacy properties including anonymity, unlinkability, and data minimization. In addition, they are compared to conventional solutions, which do not protect privacy. Our experiments indicate that location data protection constraints can be taken into account in ridesharing services without degrading their performance
Fairness Under Demographic Scarce Regime
Most existing works on fairness assume the model has full access to
demographic information. However, there exist scenarios where demographic
information is partially available because a record was not maintained
throughout data collection or due to privacy reasons. This setting is known as
demographic scarce regime. Prior research have shown that training an attribute
classifier to replace the missing sensitive attributes (proxy) can still
improve fairness. However, the use of proxy-sensitive attributes worsens
fairness-accuracy trade-offs compared to true sensitive attributes. To address
this limitation, we propose a framework to build attribute classifiers that
achieve better fairness-accuracy trade-offs. Our method introduces uncertainty
awareness in the attribute classifier and enforces fairness on samples with
demographic information inferred with the lowest uncertainty. We show
empirically that enforcing fairness constraints on samples with uncertain
sensitive attributes is detrimental to fairness and accuracy. Our experiments
on two datasets showed that the proposed framework yields models with
significantly better fairness-accuracy trade-offs compared to classic attribute
classifiers. Surprisingly, our framework outperforms models trained with
constraints on the true sensitive attributes.Comment: 14 pages, 7 page
Probabilistic Dataset Reconstruction from Interpretable Models
Interpretability is often pointed out as a key requirement for trustworthy
machine learning. However, learning and releasing models that are inherently
interpretable leaks information regarding the underlying training data. As such
disclosure may directly conflict with privacy, a precise quantification of the
privacy impact of such breach is a fundamental problem. For instance, previous
work have shown that the structure of a decision tree can be leveraged to build
a probabilistic reconstruction of its training dataset, with the uncertainty of
the reconstruction being a relevant metric for the information leak. In this
paper, we propose of a novel framework generalizing these probabilistic
reconstructions in the sense that it can handle other forms of interpretable
models and more generic types of knowledge. In addition, we demonstrate that
under realistic assumptions regarding the interpretable models' structure, the
uncertainty of the reconstruction can be computed efficiently. Finally, we
illustrate the applicability of our approach on both decision trees and rule
lists, by comparing the theoretical information leak associated to either exact
or heuristic learning algorithms. Our results suggest that optimal
interpretable models are often more compact and leak less information regarding
their training data than greedily-built ones, for a given accuracy level
Adversarial training approach for local data debiasing
The widespread use of automated decision processes in many areas of our
society raises serious ethical issues concerning the fairness of the process
and the possible resulting discriminations. In this work, we propose a novel
approach called GANsan whose objective is to prevent the possibility of any
discrimination i.e., direct and indirect) based on a sensitive attribute by
removing the attribute itself as well as the existing correlations with the
remaining attributes. Our sanitization algorithm GANsan is partially inspired
by the powerful framework of generative adversarial networks (in particular the
Cycle-GANs), which offers a flexible way to learn a distribution empirically or
to translate between two different distributions.
In contrast to prior work, one of the strengths of our approach is that the
sanitization is performed in the same space as the original data by only
modifying the other attributes as little as possible and thus preserving the
interpretability of the sanitized data. As a consequence, once the sanitizer is
trained, it can be applied to new data, such as for instance, locally by an
individual on his profile before releasing it. Finally, experiments on a real
dataset demonstrate the effectiveness of the proposed approach as well as the
achievable trade-off between fairness and utility
Robustness in Machine Learning Explanations: Does It Matter?
The explainable AI literature contains multiple notions of what an explanation is and what desiderata explanations should satisfy. One implicit source of disagreement is how far the explanations should reflect real patterns in the data or the world. This disagreement underlies debates about other desiderata, such as how robust explanations are to slight perturbations in the input data. I argue that robustness is desirable to the extent that we’re concerned about finding real patterns in the world. The import of real patterns differs according to the problem context. In some contexts, non-robust explanations can constitute a moral hazard. By being clear about the extent to which we care about capturing real patterns, we can also determine whether the Rashomon Effect is a boon or a bane
Privacy Enhancing Technologies for Ridesharing
International audienceThe ubiquitous world in which we live has fostered the development of technologies that take into account the context in which users are using them to deliver high quality services. For example, by providing personalized services based on the positions of users, Location-based services (LBS) encourage the emergence of ridesharing services. This success has come at the cost of user privacy. Indeed in current ridesharing services, users are not in control of their own location data and have to trust the ridesharing operators with the management of their data. In this paper we aim at developing privacy preserving ridesharing systems. Our first experiments proved that privacy could be improved without scarifying the utility of the delivered service